Description

Data Preprocessing

First, I transformed the nominal columns into dummy variables.

Classical Approach with Stepwise Selection

I applied a stepwise selection method in both directions (using AIC as the criterion) to identify significant predictors. This process highlighted the following predictors: X9, X2, X1, X14, X6 and X7.

I plotted residuals against each predictor and observed non-linear dependencies for predictors X1, X2, and X9.

Lasso Regularization

To further explore predictor selection, I also tried Lasso regularization, which identified a slightly different set of predictors: X1, X2, X6, X8, X9, and color_green.

Both the classical approach and Lasso regularization showed overlap in selected predictors, particularly X1, X2, X9, and X6.

Baseline Model

I set up a naive baseline using a linear model with predictors selected from both the classical approach and Lasso regularization. The baseline model achieved RMSE values of 4.304524 and 4.308791, which were quite similar.

GAM Model

To capture the non-linear relationships in the data, I fitted a Generalized Additive Model (GAM) with the following specifications:

Linear splines with respect to predictor X1 using 7 knots placed at equal quantiles. Natural splines for predictor X2 with 6 degrees of freedom. Smoothing splines for predictor X9. And linear terms with respect to X14, X6 and X7.

After fitting the GAM, I again plotted residuals against the predictors and observed that X2 and X9 still exhibited some non-linear patterns.

Cross-Validation

Finally, I performed 10-fold cross-validation to evaluate the model and identify the best-performing configuration.

Code

library(tidyverse)
library(tidymodels)
library(leaps)
library(GGally)
library(splines)
library(broom)
set.seed(39692)

df <- read_delim("hw4_train.csv", delim = ",")
Rows: 4000 Columns: 16
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr  (3): color, grade, income_class
dbl (13): X1, X2, X3, X6, X7, X8, X9, X11, X12, X13, X14, X15, y

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
summary(df)
       X1               X2                X3            color              grade                 X6        
 Min.   :-1.235   Min.   :0.06676   Min.   :-6.947   Length:4000        Length:4000        Min.   :-4.780  
 1st Qu.: 3.238   1st Qu.:2.56089   1st Qu.:-5.292   Class :character   Class :character   1st Qu.:-1.132  
 Median : 5.066   Median :4.60417   Median :-4.895   Mode  :character   Mode  :character   Median : 2.600  
 Mean   : 5.078   Mean   :4.60689   Mean   :-4.898                                         Mean   : 2.616  
 3rd Qu.: 6.878   3rd Qu.:6.68131   3rd Qu.:-4.503                                         3rd Qu.: 6.262  
 Max.   :11.525   Max.   :9.24549   Max.   :-2.907                                         Max.   :10.141  
       X7               X8               X9          income_class            X11              X12         
 Min.   :0.6348   Min.   :-1.077   Min.   :-0.8687   Length:4000        Min.   :-9.767   Min.   :-4.5661  
 1st Qu.:3.0898   1st Qu.: 3.073   1st Qu.: 2.0285   Class :character   1st Qu.:-6.560   1st Qu.:-1.5610  
 Median :4.0227   Median : 4.041   Median : 3.2078   Mode  :character   Median :-5.378   Median :-0.9069  
 Mean   :4.0527   Mean   : 4.046   Mean   : 3.1790                      Mean   :-5.392   Mean   :-0.8953  
 3rd Qu.:5.0420   3rd Qu.: 4.985   3rd Qu.: 4.3443                      3rd Qu.:-4.217   3rd Qu.:-0.2214  
 Max.   :7.8376   Max.   : 8.852   Max.   : 7.4246                      Max.   :-1.038   Max.   : 2.7490  
      X13               X14               X15               y          
 Min.   :-4.6469   Min.   :-8.7676   Min.   :-1.943   Min.   :-16.087  
 1st Qu.:-1.4875   1st Qu.:-4.0541   1st Qu.: 1.547   1st Qu.: -1.569  
 Median :-0.6378   Median :-0.4222   Median : 2.914   Median :  5.543  
 Mean   :-0.6260   Mean   :-0.2718   Mean   : 2.900   Mean   :  5.773  
 3rd Qu.: 0.2467   3rd Qu.: 3.5763   3rd Qu.: 4.250   3rd Qu.: 13.482  
 Max.   : 3.7954   Max.   : 7.9442   Max.   : 7.399   Max.   : 25.434  

Feature selection

Classical approach: Automatic selection of predictors for linear models

rec = recipe(y~.,data=df)|>step_dummy(all_nominal_predictors())

df_baked = rec |> prep(df) |> bake(new_data=NULL)

head(df_baked)
null_model <- lm(y ~ 1, data = df_baked)

full_model <- lm(y ~ ., data = df_baked)

stepwise_model <- stats::step(null_model, 
                       scope = list(lower = null_model, upper = full_model), 
                       direction = "both", 
                       trace=0,
                       #k = log(nrow(df_baked))
                       )  #k = log(n) is sometimes referred to as BIC or SBC.

summary(stepwise_model)

Call:
lm(formula = y ~ X9 + X2 + X1 + X14 + X6 + X7, data = df_baked)

Residuals:
     Min       1Q   Median       3Q      Max 
-12.0257  -3.1227  -0.0377   3.1384  12.8373 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -21.29286    0.42101 -50.576  < 2e-16 ***
X9            4.13254    0.05951  69.438  < 2e-16 ***
X2            2.25311    0.03075  73.266  < 2e-16 ***
X1            0.63155    0.03406  18.542  < 2e-16 ***
X14           0.07304    0.02396   3.049  0.00231 ** 
X6           -0.04299    0.01966  -2.186  0.02884 *  
X7            0.11708    0.06562   1.784  0.07449 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.302 on 3993 degrees of freedom
Multiple R-squared:  0.8138,    Adjusted R-squared:  0.8135 
F-statistic:  2908 on 6 and 3993 DF,  p-value: < 2.2e-16
qqnorm(residuals(stepwise_model), main="QQ-Plot of Residuals")
qqline(residuals(stepwise_model), col="red")


# ggplot2 option for a more refined plot

ggplot(data = data.frame(residuals = residuals(stepwise_model)), aes(sample = residuals)) +
  stat_qq() +
  stat_qq_line(color = "red") +
  ggtitle("QQ-Plot of Residuals") +
  theme_minimal()


plot_residuals_fn = function(m) {
  selected_vars <- all.vars(formula(m))[-1]

  continuous_vars <- selected_vars[sapply(df_baked[selected_vars], is.numeric)]
  
  for (var in continuous_vars) {
    p <- ggplot(m) + geom_point(aes_string(x = var, y = ".resid")) +
      geom_smooth(aes_string(x = var, y = ".resid"), method = 'loess', color = 'blue') +
      labs(title = paste("Residuals vs", var), x = var, y = "Residuals")
    print(p)
  }
}

plot_residuals_fn(stepwise_model)
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

Regulirization – Lasso

cv_data=vfold_cv(df_baked, v=10)

rec_reg_1 = recipe(y~.,data=df_baked)

wf_reg_1=workflow()|> add_model(linear_reg(penalty=tune(),mixture=1)|>set_engine("glmnet"))|>add_recipe(rec_reg_1)
res_reg_1=tune_grid(wf_reg_1,cv_data,grid=data.frame(penalty=seq(0.02,0.2,by=0.02)))
show_best(res_reg_1)
Warning in show_best(res_reg_1) :
  No value of `metric` was given; "rmse" will be used.
wf_reg_1_final = finalize_workflow(wf_reg_1, select_best(res_reg_1, metric = "rmse"))

wf_reg_1_final = wf_reg_1_final|>fit(df_baked)
wf_reg_1_final
══ Workflow [trained] ══════════════════════════════════════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: linear_reg()

── Preprocessor ────────────────────────────────────────────────────────────────────────────────────────────────────────────
0 Recipe Steps

── Model ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Call:  glmnet::glmnet(x = maybe_matrix(x), y = y, family = "gaussian",      alpha = ~1) 

   Df  %Dev Lambda
1   0  0.00 6.7100
2   1  7.71 6.1140
3   2 15.01 5.5710
4   2 25.98 5.0760
5   2 35.08 4.6250
6   2 42.63 4.2140
7   2 48.91 3.8400
8   2 54.11 3.4990
9   2 58.44 3.1880
10  3 62.15 2.9050
11  3 65.40 2.6470
12  3 68.10 2.4120
13  3 70.35 2.1970
14  3 72.21 2.0020
15  3 73.76 1.8240
16  3 75.04 1.6620
17  3 76.11 1.5150
18  3 76.99 1.3800
19  3 77.73 1.2570
20  3 78.34 1.1460
21  3 78.84 1.0440
22  3 79.26 0.9512
23  3 79.61 0.8667
24  3 79.90 0.7897
25  3 80.14 0.7195
26  3 80.34 0.6556
27  3 80.51 0.5974
28  3 80.65 0.5443
29  3 80.76 0.4959
30  3 80.86 0.4519
31  3 80.93 0.4117
32  3 81.00 0.3752
33  3 81.05 0.3418
34  3 81.10 0.3115
35  3 81.14 0.2838
36  3 81.17 0.2586
37  3 81.19 0.2356
38  3 81.22 0.2147
39  3 81.23 0.1956
40  3 81.25 0.1782
41  3 81.26 0.1624
42  3 81.27 0.1480
43  3 81.28 0.1348
44  3 81.29 0.1228
45  3 81.29 0.1119
46  3 81.30 0.1020

...
and 27 more lines.
wf_reg_1_select <- workflow()|>add_recipe(rec_reg_1)|>add_model(linear_reg(engine="glmnet",mixture=1,penalty=0.08))
wf_reg_1_select_fitted = fit(wf_reg_1_select, df_baked)
selected_vars = tidy(wf_reg_1_select_fitted)|>filter(estimate!=0)|>slice(-1)|>pull(term)
selected_vars
[1] "X1"          "X2"          "X6"          "X8"          "X9"          "color_green"

Model comparison

set.seed(20240926)
splitted=initial_split(df_baked,prop=0.8)
train_df=training(splitted)
test_df=testing(splitted)

cv_data=vfold_cv(train_df, v=10)

Baseline - naive

wf_baseline_1 = workflow()|>
  add_recipe(recipe(y ~ X1 + X2 + X9 + X14 + X6 + X7, data=train_df))|>
  add_model(linear_reg())
res_baseline_1=fit_resamples(wf_baseline_1, cv_data,metrics = metric_set(rmse))
collect_metrics(res_baseline_1)
last_fit_result=last_fit(wf_baseline_1,splitted)
last_fit_result|>collect_metrics()

Try to fit model based on predictiors from Lasso reg.

head(df_baked)
# "X1"          "X2"          "X6"          "X8"          "X9"          "color_green"
wf_baseline_2 = workflow()|>
  add_recipe(recipe(y ~ X1 + X2 + X9  + X6 + X8 + color_green, data=train_df))|>
  add_model(linear_reg())
res_baseline_2=fit_resamples(wf_baseline_2, cv_data,metrics = metric_set(rmse))
collect_metrics(res_baseline_2)
last_fit_result=last_fit(wf_baseline_2,splitted)
last_fit_result|>collect_metrics()

GAM

# quantile(df_baked$X1, probs = seq(0.1, 0.9, length.out = 5))
m_gam = gen_additive_mod(mode="regression") |> fit(
  y ~ splines::bs(X1, knots = c(1.776703,  3.097543,  4.128191,  5.065609,  6.022864,  7.034800,  8.375370)) + splines::ns(X2, df = 6) + s(X9) + X7 + X6 + X14,
  data=df_baked
  )
#add predictions to data and compute correct residuals
m_gam_aug=augment(m_gam, new_data=df_baked)
quantile(df_baked$X1, probs = seq(0.1, 0.9, length.out = 7))
      10% 23.33333% 36.66667%       50% 63.33333% 76.66667%       90% 
 1.776703  3.097543  4.128191  5.065609  6.022864  7.034800  8.375370 
ggplot(m_gam_aug, aes(x=X1))+geom_point(aes(y=.resid))+geom_smooth(aes(y=.resid))
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

ggplot(m_gam_aug, aes(x=X2))+geom_point(aes(y=.resid))+geom_smooth(aes(y=.resid))
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

ggplot(m_gam_aug, aes(x=X9))+geom_point(aes(y=.resid))+geom_smooth(aes(y=.resid))
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

ggplot(m_gam_aug, aes(x=X14))+geom_point(aes(y=.resid))+geom_smooth(aes(y=.resid))
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

ggplot(m_gam_aug, aes(x=X6))+geom_point(aes(y=.resid))+geom_smooth(aes(y=.resid))
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

ggplot(m_gam_aug, aes(x=X7))+geom_point(aes(y=.resid))+geom_smooth(aes(y=.resid))
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

wf_gam_1 = workflow()|>
  add_recipe(recipe(y~.,data=train_df))|>
  add_model(
    gen_additive_mod(mode="regression"),
    formula=y ~ splines::bs(X1, knots = c(1.776703,  3.097543,  4.128191,  5.065609,  6.022864,  7.034800,  8.375370)) + splines::ns(X2, df = 6) + s(X9) + X7 + X6 + X14
    )
res_gam_1=fit_resamples(wf_gam_1, cv_data,metrics = metric_set(rmse))
→ A | warning: some 'x' values beyond boundary knots may cause ill-conditioned bases

There were issues with some computations   A: x1

There were issues with some computations   A: x2

There were issues with some computations   A: x2
collect_metrics(res_gam_1)
wf_gam_1_result=last_fit(wf_gam_1,splitted)
wf_gam_1_result|>collect_metrics()
wf_gam_2 = workflow()|>
  add_recipe(recipe(y~.,data=train_df))|>
  add_model(
    gen_additive_mod(mode="regression"),
    formula=y ~ splines::bs(X1, knots = c(1.776703,  3.097543,  4.128191,  5.065609,  6.022864,  7.034800,  8.375370)) + splines::ns(X2, df = 6) + s(X3) + X7 + X6 + X14
    )
res_gam_2=fit_resamples(wf_gam_2, cv_data,metrics = metric_set(rmse))
→ A | warning: some 'x' values beyond boundary knots may cause ill-conditioned bases

There were issues with some computations   A: x1

There were issues with some computations   A: x2

There were issues with some computations   A: x2
collect_metrics(res_gam_2)
wf_gam_2_result=last_fit(wf_gam_2,splitted)
wf_gam_2_result|>collect_metrics()
LS0tCnRpdGxlOiAiSFcwNF9wbG90ZXIiCm91dHB1dDogaHRtbF9ub3RlYm9vawotLS0KCiMgRGVzY3JpcHRpb24KCiMjIyBEYXRhIFByZXByb2Nlc3NpbmcKCkZpcnN0LCBJIHRyYW5zZm9ybWVkIHRoZSBub21pbmFsIGNvbHVtbnMgaW50byBkdW1teSB2YXJpYWJsZXMuCgojIyMgQ2xhc3NpY2FsIEFwcHJvYWNoIHdpdGggU3RlcHdpc2UgU2VsZWN0aW9uCgpJIGFwcGxpZWQgYSBzdGVwd2lzZSBzZWxlY3Rpb24gbWV0aG9kIGluIGJvdGggZGlyZWN0aW9ucyAodXNpbmcgQUlDIGFzIHRoZSBjcml0ZXJpb24pIHRvIGlkZW50aWZ5IHNpZ25pZmljYW50IHByZWRpY3RvcnMuIFRoaXMgcHJvY2VzcyBoaWdobGlnaHRlZCB0aGUgZm9sbG93aW5nIHByZWRpY3RvcnM6IFg5LCBYMiwgWDEsIFgxNCwgWDYgYW5kIFg3LgoKSSBwbG90dGVkIHJlc2lkdWFscyBhZ2FpbnN0IGVhY2ggcHJlZGljdG9yIGFuZCBvYnNlcnZlZCBub24tbGluZWFyIGRlcGVuZGVuY2llcyBmb3IgcHJlZGljdG9ycyBYMSwgWDIsIGFuZCBYOS4KCiMjIyBMYXNzbyBSZWd1bGFyaXphdGlvbgoKVG8gZnVydGhlciBleHBsb3JlIHByZWRpY3RvciBzZWxlY3Rpb24sIEkgYWxzbyB0cmllZCBMYXNzbyByZWd1bGFyaXphdGlvbiwgd2hpY2ggaWRlbnRpZmllZCBhIHNsaWdodGx5IGRpZmZlcmVudCBzZXQgb2YgcHJlZGljdG9yczogWDEsIFgyLCBYNiwgWDgsIFg5LCBhbmQgY29sb3JfZ3JlZW4uCgpCb3RoIHRoZSBjbGFzc2ljYWwgYXBwcm9hY2ggYW5kIExhc3NvIHJlZ3VsYXJpemF0aW9uIHNob3dlZCBvdmVybGFwIGluIHNlbGVjdGVkIHByZWRpY3RvcnMsIHBhcnRpY3VsYXJseSBYMSwgWDIsIFg5LCBhbmQgWDYuCgojIyMgQmFzZWxpbmUgTW9kZWwKCkkgc2V0IHVwIGEgbmFpdmUgYmFzZWxpbmUgdXNpbmcgYSBsaW5lYXIgbW9kZWwgd2l0aCBwcmVkaWN0b3JzIHNlbGVjdGVkIGZyb20gYm90aCB0aGUgY2xhc3NpY2FsIGFwcHJvYWNoIGFuZCBMYXNzbyByZWd1bGFyaXphdGlvbi4gVGhlIGJhc2VsaW5lIG1vZGVsIGFjaGlldmVkIFJNU0UgdmFsdWVzIG9mIDQuMzA0NTI0IGFuZCA0LjMwODc5MSwgd2hpY2ggd2VyZSBxdWl0ZSBzaW1pbGFyLgoKIyMjIEdBTSBNb2RlbAoKVG8gY2FwdHVyZSB0aGUgbm9uLWxpbmVhciByZWxhdGlvbnNoaXBzIGluIHRoZSBkYXRhLCBJIGZpdHRlZCBhIEdlbmVyYWxpemVkIEFkZGl0aXZlIE1vZGVsIChHQU0pIHdpdGggdGhlIGZvbGxvd2luZyBzcGVjaWZpY2F0aW9uczoKCkxpbmVhciBzcGxpbmVzIHdpdGggcmVzcGVjdCB0byBwcmVkaWN0b3IgWDEgdXNpbmcgNyBrbm90cyBwbGFjZWQgYXQgZXF1YWwgcXVhbnRpbGVzLgpOYXR1cmFsIHNwbGluZXMgZm9yIHByZWRpY3RvciBYMiB3aXRoIDYgZGVncmVlcyBvZiBmcmVlZG9tLgpTbW9vdGhpbmcgc3BsaW5lcyBmb3IgcHJlZGljdG9yIFg5LgpBbmQgbGluZWFyIHRlcm1zIHdpdGggcmVzcGVjdCB0byBYMTQsIFg2IGFuZCBYNy4KCkFmdGVyIGZpdHRpbmcgdGhlIEdBTSwgSSBhZ2FpbiBwbG90dGVkIHJlc2lkdWFscyBhZ2FpbnN0IHRoZSBwcmVkaWN0b3JzIGFuZCBvYnNlcnZlZCB0aGF0IFgyIGFuZCBYOSBzdGlsbCBleGhpYml0ZWQgc29tZSBub24tbGluZWFyIHBhdHRlcm5zLgoKIyMjIENyb3NzLVZhbGlkYXRpb24KCkZpbmFsbHksIEkgcGVyZm9ybWVkIDEwLWZvbGQgY3Jvc3MtdmFsaWRhdGlvbiB0byBldmFsdWF0ZSB0aGUgbW9kZWwgYW5kIGlkZW50aWZ5IHRoZSBiZXN0LXBlcmZvcm1pbmcgY29uZmlndXJhdGlvbi4KCgojIENvZGUKCmBgYHtyfQpsaWJyYXJ5KHRpZHl2ZXJzZSkKbGlicmFyeSh0aWR5bW9kZWxzKQpsaWJyYXJ5KGxlYXBzKQpsaWJyYXJ5KEdHYWxseSkKbGlicmFyeShzcGxpbmVzKQpsaWJyYXJ5KGJyb29tKQpgYGAKCgpgYGB7cn0Kc2V0LnNlZWQoMzk2OTIpCgpkZiA8LSByZWFkX2RlbGltKCJodzRfdHJhaW4uY3N2IiwgZGVsaW0gPSAiLCIpCmBgYAoKYGBge3J9CnN1bW1hcnkoZGYpCmBgYAoKIyBGZWF0dXJlIHNlbGVjdGlvbgoKIyMgQ2xhc3NpY2FsIGFwcHJvYWNoOiBBdXRvbWF0aWMgc2VsZWN0aW9uIG9mIHByZWRpY3RvcnMgZm9yIGxpbmVhciBtb2RlbHMKCmBgYHtyfQpyZWMgPSByZWNpcGUoeX4uLGRhdGE9ZGYpfD5zdGVwX2R1bW15KGFsbF9ub21pbmFsX3ByZWRpY3RvcnMoKSkKCmRmX2Jha2VkID0gcmVjIHw+IHByZXAoZGYpIHw+IGJha2UobmV3X2RhdGE9TlVMTCkKCmhlYWQoZGZfYmFrZWQpCmBgYAoKCmBgYHtyfQpudWxsX21vZGVsIDwtIGxtKHkgfiAxLCBkYXRhID0gZGZfYmFrZWQpCgpmdWxsX21vZGVsIDwtIGxtKHkgfiAuLCBkYXRhID0gZGZfYmFrZWQpCgpzdGVwd2lzZV9tb2RlbCA8LSBzdGF0czo6c3RlcChudWxsX21vZGVsLCAKICAgICAgICAgICAgICAgICAgICAgICBzY29wZSA9IGxpc3QobG93ZXIgPSBudWxsX21vZGVsLCB1cHBlciA9IGZ1bGxfbW9kZWwpLCAKICAgICAgICAgICAgICAgICAgICAgICBkaXJlY3Rpb24gPSAiYm90aCIsIAogICAgICAgICAgICAgICAgICAgICAgIHRyYWNlPTAsCiAgICAgICAgICAgICAgICAgICAgICAgI2sgPSBsb2cobnJvdyhkZl9iYWtlZCkpCiAgICAgICAgICAgICAgICAgICAgICAgKSAgI2sgPSBsb2cobikgaXMgc29tZXRpbWVzIHJlZmVycmVkIHRvIGFzIEJJQyBvciBTQkMuCgpzdW1tYXJ5KHN0ZXB3aXNlX21vZGVsKQpgYGAKCgpgYGB7cn0KcXFub3JtKHJlc2lkdWFscyhzdGVwd2lzZV9tb2RlbCksIG1haW49IlFRLVBsb3Qgb2YgUmVzaWR1YWxzIikKcXFsaW5lKHJlc2lkdWFscyhzdGVwd2lzZV9tb2RlbCksIGNvbD0icmVkIikKCiMgZ2dwbG90MiBvcHRpb24gZm9yIGEgbW9yZSByZWZpbmVkIHBsb3QKCmdncGxvdChkYXRhID0gZGF0YS5mcmFtZShyZXNpZHVhbHMgPSByZXNpZHVhbHMoc3RlcHdpc2VfbW9kZWwpKSwgYWVzKHNhbXBsZSA9IHJlc2lkdWFscykpICsKICBzdGF0X3FxKCkgKwogIHN0YXRfcXFfbGluZShjb2xvciA9ICJyZWQiKSArCiAgZ2d0aXRsZSgiUVEtUGxvdCBvZiBSZXNpZHVhbHMiKSArCiAgdGhlbWVfbWluaW1hbCgpCmBgYAoKYGBge3J9CgpwbG90X3Jlc2lkdWFsc19mbiA9IGZ1bmN0aW9uKG0pIHsKICBzZWxlY3RlZF92YXJzIDwtIGFsbC52YXJzKGZvcm11bGEobSkpWy0xXQoKICBjb250aW51b3VzX3ZhcnMgPC0gc2VsZWN0ZWRfdmFyc1tzYXBwbHkoZGZfYmFrZWRbc2VsZWN0ZWRfdmFyc10sIGlzLm51bWVyaWMpXQogIAogIGZvciAodmFyIGluIGNvbnRpbnVvdXNfdmFycykgewogICAgcCA8LSBnZ3Bsb3QobSkgKyBnZW9tX3BvaW50KGFlc19zdHJpbmcoeCA9IHZhciwgeSA9ICIucmVzaWQiKSkgKwogICAgICBnZW9tX3Ntb290aChhZXNfc3RyaW5nKHggPSB2YXIsIHkgPSAiLnJlc2lkIiksIG1ldGhvZCA9ICdsb2VzcycsIGNvbG9yID0gJ2JsdWUnKSArCiAgICAgIGxhYnModGl0bGUgPSBwYXN0ZSgiUmVzaWR1YWxzIHZzIiwgdmFyKSwgeCA9IHZhciwgeSA9ICJSZXNpZHVhbHMiKQogICAgcHJpbnQocCkKICB9Cn0KCnBsb3RfcmVzaWR1YWxzX2ZuKHN0ZXB3aXNlX21vZGVsKQoKYGBgCgojIyBSZWd1bGlyaXphdGlvbiAtLSBMYXNzbwoKYGBge3J9CmN2X2RhdGE9dmZvbGRfY3YoZGZfYmFrZWQsIHY9MTApCgpyZWNfcmVnXzEgPSByZWNpcGUoeX4uLGRhdGE9ZGZfYmFrZWQpCgp3Zl9yZWdfMT13b3JrZmxvdygpfD4gYWRkX21vZGVsKGxpbmVhcl9yZWcocGVuYWx0eT10dW5lKCksbWl4dHVyZT0xKXw+c2V0X2VuZ2luZSgiZ2xtbmV0IikpfD5hZGRfcmVjaXBlKHJlY19yZWdfMSkKcmVzX3JlZ18xPXR1bmVfZ3JpZCh3Zl9yZWdfMSxjdl9kYXRhLGdyaWQ9ZGF0YS5mcmFtZShwZW5hbHR5PXNlcSgwLjAyLDAuMixieT0wLjAyKSkpCnNob3dfYmVzdChyZXNfcmVnXzEpCmBgYAoKYGBge3J9CndmX3JlZ18xX2ZpbmFsID0gZmluYWxpemVfd29ya2Zsb3cod2ZfcmVnXzEsIHNlbGVjdF9iZXN0KHJlc19yZWdfMSwgbWV0cmljID0gInJtc2UiKSkKCndmX3JlZ18xX2ZpbmFsID0gd2ZfcmVnXzFfZmluYWx8PmZpdChkZl9iYWtlZCkKd2ZfcmVnXzFfZmluYWwKYGBgCgoKYGBge3J9CndmX3JlZ18xX3NlbGVjdCA8LSB3b3JrZmxvdygpfD5hZGRfcmVjaXBlKHJlY19yZWdfMSl8PmFkZF9tb2RlbChsaW5lYXJfcmVnKGVuZ2luZT0iZ2xtbmV0IixtaXh0dXJlPTEscGVuYWx0eT0wLjA4KSkKd2ZfcmVnXzFfc2VsZWN0X2ZpdHRlZCA9IGZpdCh3Zl9yZWdfMV9zZWxlY3QsIGRmX2Jha2VkKQpzZWxlY3RlZF92YXJzID0gdGlkeSh3Zl9yZWdfMV9zZWxlY3RfZml0dGVkKXw+ZmlsdGVyKGVzdGltYXRlIT0wKXw+c2xpY2UoLTEpfD5wdWxsKHRlcm0pCnNlbGVjdGVkX3ZhcnMKYGBgCgoKCiMgTW9kZWwgY29tcGFyaXNvbgoKCgpgYGB7cn0Kc2V0LnNlZWQoMjAyNDA5MjYpCnNwbGl0dGVkPWluaXRpYWxfc3BsaXQoZGZfYmFrZWQscHJvcD0wLjgpCnRyYWluX2RmPXRyYWluaW5nKHNwbGl0dGVkKQp0ZXN0X2RmPXRlc3Rpbmcoc3BsaXR0ZWQpCgpjdl9kYXRhPXZmb2xkX2N2KHRyYWluX2RmLCB2PTEwKQpgYGAKCgoKIyMgQmFzZWxpbmUgLSBuYWl2ZQoKCmBgYHtyfQp3Zl9iYXNlbGluZV8xID0gd29ya2Zsb3coKXw+CiAgYWRkX3JlY2lwZShyZWNpcGUoeSB+IFgxICsgWDIgKyBYOSArIFgxNCArIFg2ICsgWDcsIGRhdGE9dHJhaW5fZGYpKXw+CiAgYWRkX21vZGVsKGxpbmVhcl9yZWcoKSkKcmVzX2Jhc2VsaW5lXzE9Zml0X3Jlc2FtcGxlcyh3Zl9iYXNlbGluZV8xLCBjdl9kYXRhLG1ldHJpY3MgPSBtZXRyaWNfc2V0KHJtc2UpKQpjb2xsZWN0X21ldHJpY3MocmVzX2Jhc2VsaW5lXzEpCmBgYAoKYGBge3J9Cmxhc3RfZml0X3Jlc3VsdD1sYXN0X2ZpdCh3Zl9iYXNlbGluZV8xLHNwbGl0dGVkKQpsYXN0X2ZpdF9yZXN1bHR8PmNvbGxlY3RfbWV0cmljcygpCmBgYAoKClRyeSB0byBmaXQgbW9kZWwgYmFzZWQgb24gcHJlZGljdGlvcnMgZnJvbSBMYXNzbyByZWcuCgpgYGB7cn0KaGVhZChkZl9iYWtlZCkKYGBgCgoKYGBge3J9CiMgIlgxIiAgICAgICAgICAiWDIiICAgICAgICAgICJYNiIgICAgICAgICAgIlg4IiAgICAgICAgICAiWDkiICAgICAgICAgICJjb2xvcl9ncmVlbiIKd2ZfYmFzZWxpbmVfMiA9IHdvcmtmbG93KCl8PgogIGFkZF9yZWNpcGUocmVjaXBlKHkgfiBYMSArIFgyICsgWDkgICsgWDYgKyBYOCArIGNvbG9yX2dyZWVuLCBkYXRhPXRyYWluX2RmKSl8PgogIGFkZF9tb2RlbChsaW5lYXJfcmVnKCkpCnJlc19iYXNlbGluZV8yPWZpdF9yZXNhbXBsZXMod2ZfYmFzZWxpbmVfMiwgY3ZfZGF0YSxtZXRyaWNzID0gbWV0cmljX3NldChybXNlKSkKY29sbGVjdF9tZXRyaWNzKHJlc19iYXNlbGluZV8yKQpgYGAKCmBgYHtyfQpsYXN0X2ZpdF9yZXN1bHQ9bGFzdF9maXQod2ZfYmFzZWxpbmVfMixzcGxpdHRlZCkKbGFzdF9maXRfcmVzdWx0fD5jb2xsZWN0X21ldHJpY3MoKQpgYGAKCgojIyBHQU0KCmBgYHtyfQojIHF1YW50aWxlKGRmX2Jha2VkJFgxLCBwcm9icyA9IHNlcSgwLjEsIDAuOSwgbGVuZ3RoLm91dCA9IDUpKQptX2dhbSA9IGdlbl9hZGRpdGl2ZV9tb2QobW9kZT0icmVncmVzc2lvbiIpIHw+IGZpdCgKICB5IH4gc3BsaW5lczo6YnMoWDEsIGtub3RzID0gYygxLjc3NjcwMywgIDMuMDk3NTQzLCAgNC4xMjgxOTEsICA1LjA2NTYwOSwgIDYuMDIyODY0LCAgNy4wMzQ4MDAsICA4LjM3NTM3MCkpICsgc3BsaW5lczo6bnMoWDIsIGRmID0gNikgKyBzKFg5KSArIFg3ICsgWDYgKyBYMTQsCiAgZGF0YT1kZl9iYWtlZAogICkKI2FkZCBwcmVkaWN0aW9ucyB0byBkYXRhIGFuZCBjb21wdXRlIGNvcnJlY3QgcmVzaWR1YWxzCm1fZ2FtX2F1Zz1hdWdtZW50KG1fZ2FtLCBuZXdfZGF0YT1kZl9iYWtlZCkKYGBgCgpgYGB7cn0KcXVhbnRpbGUoZGZfYmFrZWQkWDEsIHByb2JzID0gc2VxKDAuMSwgMC45LCBsZW5ndGgub3V0ID0gNykpCmBgYAoKCgpgYGB7cn0KZ2dwbG90KG1fZ2FtX2F1ZywgYWVzKHg9WDEpKStnZW9tX3BvaW50KGFlcyh5PS5yZXNpZCkpK2dlb21fc21vb3RoKGFlcyh5PS5yZXNpZCkpCmBgYApgYGB7cn0KZ2dwbG90KG1fZ2FtX2F1ZywgYWVzKHg9WDIpKStnZW9tX3BvaW50KGFlcyh5PS5yZXNpZCkpK2dlb21fc21vb3RoKGFlcyh5PS5yZXNpZCkpCmBgYApgYGB7cn0KZ2dwbG90KG1fZ2FtX2F1ZywgYWVzKHg9WDkpKStnZW9tX3BvaW50KGFlcyh5PS5yZXNpZCkpK2dlb21fc21vb3RoKGFlcyh5PS5yZXNpZCkpCmBgYAoKYGBge3J9CmdncGxvdChtX2dhbV9hdWcsIGFlcyh4PVgxNCkpK2dlb21fcG9pbnQoYWVzKHk9LnJlc2lkKSkrZ2VvbV9zbW9vdGgoYWVzKHk9LnJlc2lkKSkKYGBgCmBgYHtyfQpnZ3Bsb3QobV9nYW1fYXVnLCBhZXMoeD1YNikpK2dlb21fcG9pbnQoYWVzKHk9LnJlc2lkKSkrZ2VvbV9zbW9vdGgoYWVzKHk9LnJlc2lkKSkKYGBgCgoKCmBgYHtyfQpnZ3Bsb3QobV9nYW1fYXVnLCBhZXMoeD1YNykpK2dlb21fcG9pbnQoYWVzKHk9LnJlc2lkKSkrZ2VvbV9zbW9vdGgoYWVzKHk9LnJlc2lkKSkKYGBgCgpgYGB7cn0Kd2ZfZ2FtXzEgPSB3b3JrZmxvdygpfD4KICBhZGRfcmVjaXBlKHJlY2lwZSh5fi4sZGF0YT10cmFpbl9kZikpfD4KICBhZGRfbW9kZWwoCiAgICBnZW5fYWRkaXRpdmVfbW9kKG1vZGU9InJlZ3Jlc3Npb24iKSwKICAgIGZvcm11bGE9eSB+IHNwbGluZXM6OmJzKFgxLCBrbm90cyA9IGMoMS43NzY3MDMsICAzLjA5NzU0MywgIDQuMTI4MTkxLCAgNS4wNjU2MDksICA2LjAyMjg2NCwgIDcuMDM0ODAwLCAgOC4zNzUzNzApKSArIHNwbGluZXM6Om5zKFgyLCBkZiA9IDYpICsgcyhYOSkgKyBYNyArIFg2ICsgWDE0CiAgICApCnJlc19nYW1fMT1maXRfcmVzYW1wbGVzKHdmX2dhbV8xLCBjdl9kYXRhLG1ldHJpY3MgPSBtZXRyaWNfc2V0KHJtc2UpKQpjb2xsZWN0X21ldHJpY3MocmVzX2dhbV8xKQpgYGAKCgpgYGB7cn0Kd2ZfZ2FtXzFfcmVzdWx0PWxhc3RfZml0KHdmX2dhbV8xLHNwbGl0dGVkKQp3Zl9nYW1fMV9yZXN1bHR8PmNvbGxlY3RfbWV0cmljcygpCmBgYAoKYGBge3J9CndmX2dhbV8yID0gd29ya2Zsb3coKXw+CiAgYWRkX3JlY2lwZShyZWNpcGUoeX4uLGRhdGE9dHJhaW5fZGYpKXw+CiAgYWRkX21vZGVsKAogICAgZ2VuX2FkZGl0aXZlX21vZChtb2RlPSJyZWdyZXNzaW9uIiksCiAgICBmb3JtdWxhPXkgfiBzcGxpbmVzOjpicyhYMSwga25vdHMgPSBjKDEuNzc2NzAzLCAgMy4wOTc1NDMsICA0LjEyODE5MSwgIDUuMDY1NjA5LCAgNi4wMjI4NjQsICA3LjAzNDgwMCwgIDguMzc1MzcwKSkgKyBzcGxpbmVzOjpucyhYMiwgZGYgPSA2KSArIHMoWDMpICsgWDcgKyBYNiArIFgxNAogICAgKQpyZXNfZ2FtXzI9Zml0X3Jlc2FtcGxlcyh3Zl9nYW1fMiwgY3ZfZGF0YSxtZXRyaWNzID0gbWV0cmljX3NldChybXNlKSkKY29sbGVjdF9tZXRyaWNzKHJlc19nYW1fMikKYGBgCgpgYGB7cn0Kd2ZfZ2FtXzJfcmVzdWx0PWxhc3RfZml0KHdmX2dhbV8yLHNwbGl0dGVkKQp3Zl9nYW1fMl9yZXN1bHR8PmNvbGxlY3RfbWV0cmljcygpCmBgYAoK